1 Estimands Introduction

Case study: Preliminary estimand attributes

  • Population: Patients with type 2 diabetes managed by diet and exercise alone
  • Treatments: test treatment; placebo
  • Variable: Change in HbA1c from baseline to Week 26
  • Summary measure: Mean difference

Type 2 diabetes example:

Potential clinical questions of interest • What is the effect of test treatment with rescue medication as needed vs placebo with rescue medication as needed on HbA1c change from baseline to Week 26? (TP)

• What is the effect of test treatment vs placebo on… • … HbA1c change from baseline to Week 26 as if rescue medication had not been made available? (H) • x% in HbA1c from baseline to Week 26 and no need to take rescue at any time? (C) • HbA1c change from baseline to Week 26 or time of rescue medication usage whichever occurs earlier? (P) • change from baseline to Week 26 in those patients who would not need rescue medication regardless of their treatment assignment? (W)

1.1 Treatment Policy Strategy Explained

Pretect the randomization arm

The treatment policy strategy is a framework used in clinical trials and medical research to handle intercurrent events. These events are occurrences that happen during a study that may affect the outcome or treatment of participants but are not part of the planned treatment protocol. A treatment policy strategy takes a pragmatic approach to dealing with these intercurrent events.

Variable Value Usage Regardless of Intercurrent Events In a treatment policy strategy: - The value of the variable of interest (such as a clinical outcome like blood sugar level) is used regardless of whether or not the intercurrent event occurs. - This approach considers the occurrence of unexpected events, such as the need for additional medication during the study, as part of the overall treatment strategy. The effect of the treatment is measured as if these events are integral to the treatment regimen.

Intercurrent Event as Part of the Treatment Strategy - Intercurrent events are treated as part of the overall treatment plan. This approach contrasts with other strategies that might censor or exclude data if an intercurrent event occurs. - By incorporating the intercurrent event into the treatment strategy, the analysis aims to reflect real-world treatment conditions where patients might experience various changes in their treatment paths.

Example: Type 2 Diabetes Study

Treatments - Test Treatment with Rescue Medication: Patients receive a test medication for diabetes. If their blood sugar levels rise too high during the trial, they are given additional rescue medication to manage these levels. - Placebo with Rescue Medication: Similarly, the control group receives a placebo but can also use rescue medication as needed.

Clinical Question - The research question could be: What is the effect of the test treatment (with rescue medication as needed) compared to placebo (with rescue medication as needed) on HbA1c change from baseline to Week 26? - HbA1c is a marker that reflects a person’s average blood glucose levels over the past 2-3 months. The study aims to measure how this marker changes after 26 weeks of treatment.

Key Aspects of the Strategy in This Example - Rescue medication is considered an intercurrent event. Patients receive this medication if their blood sugar rises significantly during the study. - Under the treatment policy strategy, the impact of rescue medication is neither excluded nor ignored. Instead, it is integrated into the treatment strategy because: - In real-world scenarios, patients often need additional medication to control their blood sugar. - The clinical question seeks to measure the overall effectiveness of the test treatment, including how often patients require rescue medication and its combined effect on their HbA1c levels.

This strategy assesses how the test treatment (with the option for rescue medication) compares to placebo (with the option for rescue medication) in a real-world setting. It acknowledges that not every patient responds the same way to treatment and that intercurrent events like needing additional medication can occur.

Summary of the Treatment Policy Strategy - Intercurrent events, such as the use of rescue medication, are not disregarded but are incorporated into the analysis. - This strategy mirrors real-world treatment scenarios, aiming to address practical clinical questions regarding the overall effectiveness of the treatment, including its performance amidst unplanned events like the need for additional medications.

This comprehensive approach provides a holistic view of the treatment’s impact over time, considering the effect of the treatment alongside any additional interventions.

1.2 Composite Strategy Explained

In the composite strategy, the occurrence of an intercurrent event (such as the need for additional medication or other unexpected outcomes) is seen as informative about the patient’s overall outcome. This strategy incorporates the intercurrent event directly into the clinical outcome or variable being studied, effectively altering the way the variable is analyzed.

Intercurrent Event as Informative Unlike the treatment policy strategy where the intercurrent event is treated as part of the overall treatment process but not directly tied to the outcome variable, the composite strategy views the intercurrent event as providing critical information about the patient’s outcome. If a patient experiences an intercurrent event, such as needing rescue medication, it affects how the treatment outcome is interpreted.

Altering the Outcome Variable The key difference in this strategy is that the outcome variable is modified based on the occurrence of the intercurrent event. The variable’s definition is changed to reflect both the intended treatment outcome and the intercurrent event. For instance, instead of just looking at whether the patient’s blood sugar improved, the outcome might also account for whether the patient needed rescue medication, combining these into a single variable that captures more information about the treatment’s overall success.

Example: Type 2 Diabetes Study

In this case, the intercurrent event is the use of rescue medication, and it provides significant information about the patient’s response to the treatment. The strategy will incorporate this into the analysis of the treatment effect.

Defining a Composite Outcome

One way to apply a composite strategy is by dichotomizing (splitting the outcome into two categories) the variable based on both HbA1c improvement and whether or not rescue medication was needed:

  • Responder: A patient who:
    • Achieves an improvement of x% in HbA1c from baseline to Week 26 AND
    • Does not need to take rescue medication at any point during the study.
  • Non-Responder: A patient who:
    • Fails to achieve the target HbA1c improvement OR
    • Needs rescue medication at any time during the study.

By classifying patients as “responders” or “non-responders” based on these two criteria, the intercurrent event (needing rescue medication) is considered to be part of the outcome rather than just something that happened along the way.

Clinical Question In this example, the clinical question might be framed as:

What is the effect of the test treatment vs placebo on achieving an improvement of x% in HbA1c from baseline to Week 26, without the need for rescue medication at any point during the study?

This question looks at two components: - Improvement in HbA1c: How effective is the treatment at lowering blood sugar levels over time? - Use of Rescue Medication: How effective is the treatment in maintaining blood sugar control without requiring additional medications?

How This Composite Strategy Works - Combination of Variables: The outcome variable now combines two aspects: 1. Biochemical improvement (HbA1c reduction), 2. Clinical management success (no need for additional medications).

By creating a composite outcome, this strategy provides a more holistic measure of the treatment’s effectiveness. It answers the clinical question of whether the treatment not only improves blood sugar levels but also maintains stability without needing extra intervention.

Key Points of the Composite Strategy - Informative Event: The intercurrent event (such as using rescue medication) tells us something critical about the patient’s health and treatment success. - Modified Variable: The variable used to measure success is changed to incorporate the intercurrent event, providing a richer picture of the treatment’s overall effect. - Dichotomization: The composite variable can be dichotomized (e.g., responder vs non-responder) based on criteria that include the intercurrent event.

Summary of the Composite Strategy - In the composite strategy, the intercurrent event (like needing rescue medication) is considered to provide meaningful information about the outcome and is integrated into the analysis. - The clinical outcome variable is modified to reflect both the primary outcome (e.g., HbA1c improvement) and the occurrence of the intercurrent event, giving a more comprehensive assessment of the treatment’s effectiveness. - In the Type 2 diabetes example, the outcome could be dichotomized into responders (those who both improve and don’t need rescue medication) and non-responders (those who either fail to improve or need rescue medication).

This approach helps to evaluate the treatment’s success from a broader, more inclusive perspective.

1.3 Hypothetical Strategies Explained

A hypothetical strategy involves imagining a scenario where certain events, such as intercurrent events, did not occur and estimating what the treatment effect would have been in that scenario. This strategy is frequently used in clinical trials to explore “what if” questions. It can be particularly useful for understanding what the treatment’s impact might have been under different conditions, but it also comes with challenges related to feasibility, clinical plausibility, and interpretation.

What is a Hypothetical Strategy? A hypothetical strategy asks the question: What would the treatment effect be in a scenario where some intercurrent event (such as taking rescue medication) did not happen? This creates an alternative reality where an event like rescue medication is imagined to have not occurred or where patients behave in a particular way (e.g., adhering strictly to treatment). The goal is to estimate the treatment effect under this hypothetical condition, which often requires assumptions about what would have happened. For instance, if analyzing the effect of a diabetes drug on blood sugar levels (HbA1c) and some patients took rescue medication during the study, a hypothetical strategy might ask: What would the treatment effect have been if rescue medication was never made available?

Key Questions When Considering Hypothetical Scenarios Hypothetical strategies come with various considerations: - What would the treatment effect be if patients had not taken rescue medication? This is a common hypothetical scenario but requires careful thought. Is it plausible to imagine patients not having access to rescue medication in real-world settings? - What would the treatment effect be if rescue medication had not been made available? This could be more plausible in placebo-controlled trials, where we want to know the true effect of the test drug without rescue medication being used. - What would the treatment effect be if patients had not needed rescue medication due to lack of efficacy? This scenario is typically less relevant because it assumes that patients simply would not need rescue medication, which is not something that can be controlled or predicted in practice.

The Challenge of Precision in Defining Hypothetical Scenarios A major challenge with hypothetical strategies is that they can introduce ambiguity. There can be a broad range of hypothetical scenarios, each with different clinical relevance, so it’s essential to define them precisely. Speaking of “THE hypothetical strategy” leaves too much room for vagueness. Hypothetical strategies need to be carefully formulated to ensure they make sense in a clinical context. The strategy must define a clear hypothetical condition and outline exactly how the intercurrent event is imagined not to occur.

Examples of Hypothetical Scenarios: - Relevant Hypotheticals: In placebo-controlled trials, it might be reasonable to ask what the effect of the treatment would be if rescue medication was not made available. This scenario is plausible because it reflects a condition where additional intervention is restricted. - Irrelevant Hypotheticals: A scenario where patients fully adhere to treatment despite serious adverse events is unlikely to be useful. It is not realistic to assume that patients would always stick to treatment in real-world clinical settings, so such a scenario lacks clinical relevance.

When is a Hypothetical Scenario Relevant? There are certain considerations that help determine whether a hypothetical strategy is relevant: - Can You Intervene on the Intercurrent Event? If an intervention can be made on the intercurrent event (such as limiting the use of rescue medication), then a hypothetical scenario can be useful. If an event is uncontrollable (such as how patients react to treatment), then a hypothetical strategy may not be of interest because it is too far removed from reality. - Clinical Plausibility: The clinical plausibility of the hypothetical scenario is key. If the scenario cannot plausibly happen in clinical practice (like patients always adhering to treatment despite severe side effects), then it may not offer any meaningful insights. Relevant scenarios are usually those that change the study design or treatment options, while changing patient behaviors tends to lead to irrelevant scenarios.

Case Study: Type 2 Diabetes Example In the context of a Type 2 diabetes trial, a hypothetical scenario might be: What if rescue medication had not been made available?

Clinical Question: What is the effect of test treatment vs placebo on HbA1c change from baseline to Week 26 as if rescue medication had not been made available? In this case, the hypothetical scenario assumes that no rescue medication was provided, meaning the analysis would estimate the treatment’s effect on blood sugar control (HbA1c) without any additional medications to influence the results.

Regulatory Guidance: The European Medicines Agency (EMA) in 2024 guidance suggests using a hypothetical strategy for rescue medication in type 2 diabetes studies. Specifically, they recommend analyzing data under the assumption that rescue medication or other medications influencing HbA1c values were not introduced.

Estimation Challenges Estimating estimands (the treatment effect or outcome of interest) with a hypothetical strategy often requires missing data or causal inference methods. Since the scenario is hypothetical, and patients did receive rescue medication, there is a need for advanced statistical techniques to estimate what would have happened if they had not. Methods such as causal inference and handling missing data (from patients who took rescue medication) play a role in these estimations.

Summary of Hypothetical Strategies: Hypothetical strategies explore what if scenarios, imagining treatment effects in situations where intercurrent events (like taking rescue medication) did not occur. These strategies are relevant when the hypothetical scenario is clinically plausible, such as removing the option for rescue medication in a trial, but they lose relevance when the scenario assumes unrealistic patient behaviors. In the context of clinical trials, defining the hypothetical scenario clearly is crucial to avoid ambiguity and ensure that the strategy is meaningful for clinical practice. Estimation of hypothetical estimands typically requires advanced statistical methodologies to deal with missing data and causal inference challenges.

1.4 Principal Stratum Strategy Explained

The principal stratum strategy is used to address intercurrent events in clinical trials by focusing on specific subgroups (or strata) of patients based on how they would respond to treatment or placebo in relation to the occurrence of an intercurrent event. This strategy seeks to isolate the treatment effect within well-defined subgroups of the population, often aiming to clarify the effect in people with particular characteristics related to the event.

Principal Stratification: Splitting the Population

The principal stratum strategy involves dividing the study population into four distinct strata based on their expected response to treatment or placebo in terms of whether or not they experience the intercurrent event.

The Four Strata: 1. Stratum 1: Patients who would not experience the intercurrent event regardless of whether they are assigned to the test treatment or placebo. These are people for whom the intercurrent event would not happen, regardless of treatment.

  1. Stratum 2: Patients who would experience the intercurrent event regardless of whether they are assigned to the test treatment or placebo. For these people, the intercurrent event will occur no matter what treatment they receive.

  2. Stratum 3: Patients who would not experience the intercurrent event if assigned to the test treatment, but would experience it if assigned to the placebo. These patients benefit from the test treatment in terms of avoiding the intercurrent event.

  3. Stratum 4: Patients who would not experience the intercurrent event if assigned to the placebo, but would experience it if assigned to the test treatment. These are patients for whom the test treatment could increase the risk of the intercurrent event compared to the placebo.

Clinical Question Within a Principal Stratum

Once these strata are defined, the principal stratum strategy seeks to answer clinical questions within a specific stratum. For instance, instead of looking at the overall population, the study focuses on one subgroup (or stratum), such as patients who would not need rescue medication if they received the test treatment but would need it if they were on the placebo.

This changes the population attribute of the study because the analysis is restricted to just one stratum rather than the entire population. The objective is to estimate the treatment effect specifically for this subgroup, making the results more targeted.

Principal Strata Are Not Identifiable with Certainty

One of the main challenges with the principal stratum strategy is that the strata are not always identifiable with certainty. In other words, it can be difficult or impossible to know for sure which patients fall into which stratum based on observed data. This is because we cannot observe both potential outcomes (e.g., what happens under test treatment and what happens under placebo) for the same individual at the same time.

To overcome this, statistical methods are used to estimate the membership in each stratum, but these estimates can come with some level of uncertainty.

Response to Treatment Before the Intercurrent Event

In some cases, the response to treatment prior to the occurrence of the intercurrent event is of interest. This is particularly important when considering events like death, where the outcome after the event can no longer be measured. In such cases, the while-alive strategy or while-on-treatment strategy may be applied.

  • While-alive Strategy: If the intercurrent event is death, the focus is on the treatment effect while the patient is still alive. After death, the outcome can no longer be measured, so the analysis is constrained to the time before the event.

  • While-on-treatment Strategy: This strategy focuses on analyzing the data while the patient is still on the assigned treatment. If patients stop treatment early, their outcomes are considered only until the point of discontinuation. This can be difficult to interpret, especially if the treatment durations vary significantly between groups.

Type 2 Diabetes Example

Let’s apply this to a Type 2 diabetes example:

Variable of Interest In a study where the intercurrent event is the use of rescue medication, the variable might be: HbA1c change from baseline to Week 26, or the time of rescue medication usage, whichever occurs earlier.

This variable focuses on measuring HbA1c (a marker of long-term blood sugar control) until the point when patients require rescue medication. Once rescue medication is introduced, the measurement stops, as the use of rescue medication significantly affects HbA1c values.

Clinical Question The clinical question here might be: What is the effect of test treatment vs placebo on HbA1c change from baseline to Week 26 or the time of rescue medication usage, whichever occurs earlier?

This analysis would estimate the effect of the test treatment only up to the point when the intercurrent event (rescue medication) occurs. By doing so, the study focuses on the time when patients are still on the assigned treatment before the additional intervention (rescue medication) changes the dynamics of the outcome.

Key Points of Principal Stratum Strategy - Focus on Subgroups: The strategy splits the population into different strata based on their likelihood of experiencing an intercurrent event, allowing for a more focused analysis of specific subgroups. - Changing Population Attribute: The analysis is not on the entire population but rather on a specific subgroup, so the population attribute is changed. - Uncertainty in Strata Membership: Identifying the exact members of each stratum can be difficult, and statistical methods must be used to estimate this with some level of uncertainty. - Treatment Effect Prior to Intercurrent Event: The strategy often looks at the effect of treatment before the intercurrent event occurs, such as analyzing HbA1c levels before patients need rescue medication.

Summary of the Principal Stratum Strategy - The principal stratum strategy focuses on estimating the treatment effect within a particular subgroup (or stratum) of patients based on how they would respond to treatment or placebo in relation to an intercurrent event. - This strategy changes the population being studied, as the analysis is limited to specific subgroups rather than the entire population. - Strata are based on the likelihood of experiencing an intercurrent event, but these strata are not identifiable with certainty, requiring advanced statistical estimation. - In the Type 2 diabetes example, the analysis might focus on HbA1c changes until the point of rescue medication use, providing insight into the treatment effect before the intercurrent event alters the outcome.

This approach allows researchers to gain insights into how specific groups of patients respond to treatment, offering more targeted information than a population-wide analysis.

1.5 While-on-Treatment Strategy Explained

The while-on-treatment strategy focuses on the response to treatment that occurs prior to the occurrence of an intercurrent event (such as discontinuation of treatment, use of rescue medication, or death). This strategy is designed to estimate the treatment effect only while the patient remains on the assigned treatment and before any major event occurs that would disrupt the planned treatment regimen.

Response to Treatment Prior to the Intercurrent Event

The while-on-treatment strategy focuses on analyzing how patients respond to the treatment up until the point that an intercurrent event occurs. Once the event happens, such as the need for rescue medication or death, the data after that point is no longer used to measure treatment efficacy.

For example: - In a diabetes trial, the focus would be on how much the test treatment affects HbA1c (a measure of blood sugar control) before the patient requires rescue medication. - In trials where death is the intercurrent event, the strategy might be called the while-alive strategy, where the treatment’s impact is measured while the patient is still alive.

Challenges of Interpretation

One of the main challenges with the while-on-treatment strategy is that it can be difficult to interpret the results, particularly when the duration of treatment differs significantly between treatment arms.

  • Unequal treatment durations: If patients in one treatment group remain on the treatment longer than those in the other group, comparisons between the two groups can become difficult. This is because the treatment effects might appear stronger in one group simply due to the fact that patients remained on treatment longer, rather than because the treatment is more effective.

In such cases, the treatment effect estimates may be biased or not fully comparable across groups.

Changes in the Variable and Summary Measure

This strategy changes the variable attribute (the way the outcome is measured) because the outcome is only measured up until the intercurrent event occurs. After the event, the data is not considered. Additionally, the summary measure (such as the average HbA1c reduction over time) may also change because the analysis only includes the time when patients are still on treatment.

In other words, the effectiveness of the treatment is evaluated based on the period when the patient is actively receiving it and before any other interventions (like rescue medication) occur.

Type 2 Diabetes Example

Let’s consider a Type 2 diabetes example:

Variable of Interest: - The variable here could be HbA1c change from baseline to Week 26, or the time of rescue medication usage, whichever occurs earlier.

This means that for patients who need rescue medication during the trial, their HbA1c change is measured only until the point when they require rescue medication. After that, the data is no longer used, as the intercurrent event (rescue medication) alters the outcome being measured.

Clinical Question: - The clinical question might be: What is the effect of test treatment vs placebo on HbA1c change from baseline to Week 26, or time of rescue medication usage, whichever occurs earlier?

This analysis would estimate the treatment effect on HbA1c before the intercurrent event (i.e., rescue medication) occurs. This means that the focus is on how effective the treatment is at controlling blood sugar up until the point when the patient needs additional intervention.

Key Points of the While-on-Treatment Strategy - Response before the event: The primary focus is on analyzing the treatment’s effect before the intercurrent event, which means that the data used is limited to the time before rescue medication (or other events like death) is introduced.

  • Variable and summary measure change: The way the outcome is measured changes because data collection stops once the intercurrent event occurs. This may also alter the summary measures used to describe the treatment effect (e.g., average HbA1c reduction over time).

  • Interpretation challenges: If the treatment duration differs between the two arms (e.g., one group receives treatment for a longer period before needing rescue medication), comparing the results between groups may become difficult and could introduce bias.

Summary of the While-on-Treatment Strategy

  • The while-on-treatment strategy focuses on estimating the treatment effect while patients remain on the assigned treatment and before any intercurrent event (like rescue medication or death) occurs.
  • This approach is useful when the goal is to understand the effect of a treatment up until the point where other interventions are introduced or patients stop treatment.
  • The strategy can introduce challenges in interpreting results, especially when treatment durations differ between groups, which may lead to biased or unclear comparisons.
  • In the Type 2 diabetes example, this strategy would analyze HbA1c changes from baseline to Week 26 or until the patient needed rescue medication, providing insight into the treatment’s impact before additional interventions are required.

This strategy is particularly relevant when it is important to measure the treatment effect in isolation before other factors come into play, but it requires careful consideration of potential biases introduced by differing treatment durations across groups.

1.6 Causal

Causal estimands are a critical concept in the field of statistics, particularly when it comes to understanding the effect of interventions in clinical trials and observational studies. They are designed to estimate the impact of a treatment by considering what the outcome would have been under different treatment conditions.

  1. Concept of Causal Estimands

Causal estimands are aimed at answering “what if” questions in a formal, quantitative way. They focus on understanding the effect of a specific treatment by asking how the outcomes would differ if the treatment were applied versus not applied, or if an alternative treatment were used. This approach aligns with causal inference, which seeks to infer the cause-and-effect relationship from data.

  1. Framework of Potential Outcomes

The potential outcomes framework is fundamental to causal inference and was originally formalized by Donald Rubin. It considers every subject in a study to have potential outcomes under each treatment condition. For example: - \(Y(1)\): Outcome if the subject receives the treatment. - \(Y(0)\): Outcome if the subject does not receive the treatment.

These potential outcomes help to define the causal effect of the treatment, which cannot be observed directly since we can only observe one of these outcomes for each individual — the one corresponding to the treatment they actually received.

  1. Causal Estimand Formula

The basic causal estimand in a randomized controlled trial (RCT) can be expressed as: - \(E(Y(1)) - E(Y(0))\) This represents the expected difference in outcomes between subjects assigned to the treatment versus those assigned to the control. This difference is what statisticians aim to estimate through the trial.

  1. Challenges in Observational Studies

In observational studies, where treatments are not randomly assigned, estimating causal effects becomes more complex due to potential confounding factors. Here, additional models and assumptions about how treatments are assigned to patients (assignment models) and how outcomes are generated (outcome models) are necessary. These models help to adjust for factors that may influence both the treatment assignment and the outcomes.

  1. International Council for Harmonization (ICH) and Causal Estimands

The ICH guidelines emphasize the importance of causal estimands in clinical trials, suggesting that the trials should be designed to answer specific causal questions. Even though the term “causal” is not explicitly used, the guidelines align with causal reasoning principles to ensure that the results of clinical trials are robust and interpretable in terms of causal effects.

  1. Statistical Inference for Causal Estimands

Statistical methods are employed to estimate causal estimands from the observed data. In RCTs, this often involves comparing the observed outcomes between the treatment and control groups, leveraging the randomization to argue that these groups are comparable. In non-randomized studies, more sophisticated statistical techniques, such as instrumental variable analysis, propensity score matching, or structural models, are required.

  1. Importance for Regulatory Authorities

Regulatory authorities, such as the FDA or EMA, are particularly interested in causal estimands because they provide a clear basis for regulatory decisions regarding drug approvals. By focusing on causal estimands, regulators can better understand the true effect of a drug, independent of other confounding treatment or patient characteristics

1.7 Missing data vs intercurrent event

Intercurrent events are incidents that occur after the initiation of treatment and may impact the interpretation of the trial outcomes or affect the continuity of the measurements related to the clinical question of interest. These can include events such as patients starting additional therapies, experiencing side effects leading to treatment discontinuation, or any other circumstances that alter the course of standard treatment administration.

Missing data refers to information that was intended to be collected but was not, due to various reasons such as patients dropping out of the study, missing visits, or failure to record certain outcomes. It’s important to distinguish between data that is missing because it was not collected (but could have been under different circumstances) and data that is considered not meaningful due to an intercurrent event.

Handling Strategies for Intercurrent Events and Missing Data

1. Treatment Policy Strategy: - Approach: Includes all data up to and possibly including the intercurrent event, considering the measurements of interest regardless of subsequent therapies or changes. - Missing Data Issue: Here, the missing data problem may arise when assuming values for unobserved outcomes based on the observed data. For example, if patients are lost to follow-up but are assumed to continue on their projected path without treatment changes.

2. Hypothetical Strategy: - Approach: Assumes a scenario where the intercurrent event, such as treatment discontinuation, does not occur. - Missing Data Issue: Focuses on hypothetical data. For instance, it would not consider data from follow-up visits if a patient had not been lost to follow-up, imagining the patient remained in the trial under the initial treatment conditions.

3. Composite Strategy: - Approach: Combines multiple elements or outcomes into a single variable that incorporates the intercurrent event as part of the variable of interest. - Missing Data Issue: Typically, there is no missing data concern under this strategy as the intercurrent events are accounted for within the composite outcome measure.

4. While-on-Treatment Strategy: - Approach: Analyzes the response to treatment only up to the point of an intercurrent event. - Missing Data Issue: There is generally no missing data because the analysis only includes data collected while the patients were on treatment and before any intercurrent event.

5. Principal Stratum Strategy: - Approach: Focuses on specific subgroups (strata) that are not affected by the intercurrent events, based on their potential outcomes under different treatment scenarios. - Missing Data Issue: This strategy avoids missing data issues by defining the population such that the intercurrent event is not considered relevant for the stratum of interest. It inherently excludes patients from the analysis if they are outside the target strata.

1.8 Reference

  1. Bornkamp B, et al. Principal stratum strategy: Potential role in drug development[J]. Pharmaceutical Statistics, 2021.

  2. EMA. Guideline on clinical investigation of medicinal products in the treatment or prevention of diabetes mellitus[S]. CPMP/EWP/1080/00 Rev.2, 2024.

  3. Olarte Parra C, Daniel RM, Bartlett JW. Hypothetical estimands in clinical trials: a unification of causal inference and missing data methods[J]. Statistics in Biopharmaceutical Research, 2022.

  4. Rubin DB. Multiple Imputation for Nonresponse in Surveys[M]. New York: Wiley, 1987.

2 Defining Estimands

2.1 Scientific Question of Interest

Strategies for Addressing Intercurrent Events:

  • Treatment Policy Strategy: This approach uses the value for a variable regardless of whether or not the intercurrent event occurs. It considers the intercurrent event as part of the treatment strategies being compared. This means that the analysis accepts the intercurrent event as a natural part of the treatment process and incorporates its occurrence into the overall treatment evaluation.

  • Hypothetical Strategies: These involve imagining a scenario in which the intercurrent event does not occur. This strategy is used to assess what the outcome would have been if the intercurrent event had been entirely absent, providing a clearer picture of the treatment’s effect without the confounding impact of the event.

  • Composite Variable Strategies: This approach integrates the intercurrent event into the outcome variable itself, recognizing that the event provides meaningful information about the patient’s outcome. By incorporating the intercurrent event into the analysis variable, this strategy allows for a comprehensive evaluation of the treatment’s effect, including how it relates to the occurrence of specific events.

  • While on Treatment Strategies: This category was mentioned in the list but not further explained in the image. Typically, this strategy would involve analyzing data only from the period during which patients are actively receiving treatment, disregarding data from after treatment discontinuation or other deviations.

  • Principal Stratum Strategies: Also listed but not detailed in the image, this approach typically involves focusing on a subgroup of participants who are unaffected by the intercurrent event. This method aims to isolate the effect of the treatment in a “pure” form by evaluating only those who would adhere to the treatment regimen regardless of potential intercurrent events.

2.2 Tinking Process

  1. Therapeutic Setting and Intent of Treatment Determining a Trial Objective:
    • This initial step involves defining the therapeutic context and the specific goals of the treatment under investigation. It sets the foundation for the trial by clarifying its primary objectives based on the medical need and intended therapeutic benefits.
  2. Identify Intercurrent Events:
    • Intercurrent events are occurrences that could potentially affect the interpretation of the trial’s outcome. This step involves identifying all possible events such as additional treatments, protocol deviations, or loss of follow-up that may interfere with the trial results.
  3. Discuss Strategies to Address Intercurrent Events:
    • Once intercurrent events are identified, this step focuses on developing strategies to manage them. These strategies ensure that the trial can proceed as smoothly as possible and that the data remains reliable despite these events.
  4. Agree on the Estimand(s):
    • An estimand is a precise description of the effect to be estimated by the trial. This step involves reaching consensus on what exactly the trial aims to estimate regarding the treatment effect, taking into account the strategies for handling intercurrent events.
  5. Align Choices on Trial Design, Data Collection, and Method of Estimation:
    • This step is about making informed decisions on the trial design, how data will be collected, and the methods used for estimating the treatment effect. These choices are crucial for ensuring that the trial will effectively address its objectives.
  6. Identify Assumptions for the Main Analysis and Suitable Sensitivity Analyses to Investigate These Assumptions:
    • Assumptions related to the trial’s main analysis are identified here. Sensitivity analyses are planned to test these assumptions, helping to understand how robust the findings are to changes in the assumptions.
  7. Document the Chosen Estimands:
    • The final step involves formally documenting the agreed-upon estimands. This documentation is vital for clarity and ensures that all stakeholders have a clear understanding of what the trial aims to estimate and how it will be done.

2.3 Case Study 1: Treatment efficacy in patients with chronic inflammatory conditions

2.3.1 Background

  1. Study Purpose
    • The primary objective of the study is to demonstrate the superiority of a novel biologic treatment over placebo in managing patients with a chronic inflammatory condition.
  2. Novel Treatment
    • The treatment is a biologic administered once per month, aimed at controlling the inflammatory condition.
  3. Variable
    • The primary outcome or endpoint of the study is the clinical response at 12 months, which is measured as a binary variable (yes/no response).
  4. Study Design
    • The trial is set up as a double-blind, placebo-controlled, randomized, parallel-group design. This ensures that neither the participants nor the researchers know which treatment the participants are receiving, to prevent bias.

Study Design and Assumptions

  1. Study Timeline and Patient Flow
    • The study timeline is illustrated for six patients, showing their progress from randomization through to 12 months. For ethical reasons, patients in both the novel treatment and placebo groups are allowed to switch to an open-label novel treatment after the first 4 months.
  2. Open Label Treatment
    • Indicates that after 4 months, regardless of their initial grouping (novel treatment or placebo), participants have the option to receive the novel treatment openly. This transition is visualized with red arrows indicating the switch.
  3. Planning Assumptions
    • It’s estimated that approximately 40% of patients in the novel treatment arm and 70-80% in the placebo arm will switch to the open-label novel treatment after 4 months.
    • Historical studies in similar conditions reported no deaths, which might influence the planning regarding safety monitoring and data analysis expectations.

2.3.2 Estimands Proposed in the Study

**Estimand 1 (Hypothetical):

  1. Estimand 1 (Hypothetical): The treatment difference in proportion of clinical responders that would be observed if patients could not switch to open-label treatment.

    • Description: This estimand would evaluate the treatment difference in the proportion of clinical responders as if no patients were allowed to switch to open-label treatment. It aims to estimate the pure effect of the initial treatment regimens without the confounding effect of switching.
    • Health Authority Feedback: The HA expressed concerns about the feasibility of estimating this effect due to the assumptions required:
      • Assumption of Comparability: Assuming that patients who did not switch are comparable to those who would have switched.
      • Identification of Representative Subset: The ability to identify and appropriately use data from a representative subset of non-switching patients to predict outcomes for those who might have switched.
      • Unverifiability of Assumptions: Noting that these assumptions are largely theoretical and cannot be empirically verified, which undermines the reliability of the estimand.

HA: Even if the estimand is considered clinically relevant (in this setting of a treatment targeting the symptoms of a chronic disease), we continue to have concerns about whether it can be estimated with minimal and plausible assumptions. First, we would have to assume that some of the patients who do not initiate biologic escape are similar enough (to the patients who escape) that their outcomes at the end of the study are representative of the hypothetical outcomes in those patients that initiated escape. Second, we would have to be able to identify that subset of representative patients and effectively use their collected data in the statistical model to predict the hypothetical outcomes in patients who escaped. It is not at all clear whether your proposed statistical model has identified such representative patients. Finally, any such assumptions are unverifiable, as you note

  1. Estimand 2 (Treatment Policy): The treatment difference in proportion of clinical responders regardless of whether patients switched to open-label treatment.

    • Description: This estimand considers the treatment difference in the proportion of clinical responders regardless of any switching to open-label treatment. It reflects a real-world scenario where treatment effects are evaluated inclusive of management changes like switching.
  2. Estimand 3 (Composite): The treatment difference in proportion of clinical responders where switching to open-label treatment is considered as no clinical response

    • Description: This approach treats any switching to open-label treatment as equivalent to a non-response. It simplifies the analysis by directly associating the switching action with treatment failure or inadequacy.
    • Selected by Study Team: Following the HA’s critique, the study team adopted this approach due to its straightforward interpretation and alignment with clinical trial objectives.

2.3.3 Chosen Estimand Attributes (Composite Approach)

  • Population: Patients with a chronic inflammatory condition.
  • Variable: Clinical response at 12 months, where switching is equated to non-response.
  • Treatments: Novel biologic treatment administered monthly versus placebo.
  • Population Summary: The difference in the proportion of patients achieving a clinical response between the novel treatment and placebo groups.

Clinical Question of Interest:

The key question is: “What is the difference in the proportion of patients achieving clinical response at 12 months for patients with a chronic inflammatory condition treated with a novel biologic treatment versus placebo, where the need for rescue (switch) would count as non-response?”

Significance of the Chosen Estimand:

The composite estimand was chosen because it provides a clear and straightforward method for dealing with the confounding factor of treatment switching. By considering switches as non-responses, it directly reflects the efficacy of the initial treatment in controlling the disease without the influence of additional interventions. This approach not only aligns with the trial’s regulatory expectations but also addresses the HA’s concerns about the practical challenges and verifiability associated with the hypothetical estimand.

2.4 Case Study 2: Effectiveness of an oral treatment in chronic dermatological condition

2.4.1 Background

  1. Study Purpose:
    • The aim is to establish the superiority of a new oral treatment over placebo in patients with a chronic dermatological condition that does not respond well to standard therapies.
  2. Novel Treatment:
    • This involves a new oral medication administered daily, targeting the specified dermatological condition.
  3. Variable:
    • The primary endpoint is assessed using the Weekly Activity Score (WAS), which evaluates symptoms and their intensity over a week, scaled from 0 (no symptoms) to 50 (numerous severe symptoms). This score is assessed at baseline and then every week up to week 12.
  4. Study Design:
    • The trial is structured as a double-blind, placebo-controlled, randomized, parallel-group design, maintaining the standard for clinical research to ensure objectivity and reliability of the results.

Study Design and Assumptions

  1. Primary Endpoints:
    • Two potential primary endpoints are proposed:
      • Change from baseline in WAS at Week 12: This continuous measure is deemed clinically relevant and previously accepted by health authorities.
      • WAS ≤ 10 at week 12 (binary): This endpoint, representing a specific treatment goal (mild or no symptoms), is favored by experts but has not been previously used in registration studies.
  2. Medication Guidelines:
    • Background Medication: Participants will receive either the novel treatment or placebo along with a second-generation antihistamine. The dose and type of this background medication are fixed throughout the study.
    • Rescue Medication: An alternative second-generation antihistamine can be used daily if symptoms are unbearable.
    • Prohibited Medications: Any corticosteroid or other treatments for the skin condition are forbidden during the trial to avoid confounding the study results.
  3. Assumptions:
    • Rescue Medication: It’s anticipated that more than 75% of participants may need rescue medication. This is expected in both treatment arms and is not thought to significantly affect the WAS at week 12.
    • Prohibited Medications: Less than 10% of participants are expected to use prohibited medications. However, if used, corticosteroids could significantly alter the WAS scores.

Implications for Analysis

  • Rescue and Background Medications: The consistent use of background medication and the availability of rescue medication mimic typical clinical practices, potentially increasing the generalizability of the study results.
  • Assessment of Primary Endpoints: The binary endpoint of WAS ≤ 10 adds a clear, practical measure of success, while the continuous change from baseline provides a detailed quantification of treatment effect.
  • Handling of Prohibited Medications: The strict prohibition of certain medications ensures the integrity of the trial results but requires diligent monitoring and adherence from participants.

2.4.2 Initial Proposed Estimand

In Case Study 2, the study team initially proposed a hypothetical estimand for a clinical trial aimed at evaluating a new treatment for a chronic dermatological condition. This proposal and the subsequent feedback from the health authority led to a revision of the estimand approach.

Hypothetical Estimand:

  • Objective: To measure the treatment difference in change from baseline in the Weekly Activity Score (WAS) at 12 weeks assuming no patient took corticosteroids.
  • Health Authority Feedback: The authority criticized this approach because it does not reflect real-world clinical practice where patients might need prohibited medications. The hypothetical scenario was deemed inappropriate as it might not provide an accurate or feasible assessment of the treatment’s effectiveness due to its detachment from typical clinical scenarios.

Health Authority’s Rationale

  • Concerns: The hypothetical estimand could not reliably estimate the treatment effect due to the unrealistic assumption that no patients would use prohibited medications. Furthermore, any necessity for such medications would suggest treatment inadequacy, thus affecting the reliability of the outcome.

2.4.3 Chosen Estimand Attributes (Composite Approach)

Composite Estimand:

  • Population: Patients with a chronic dermatological condition unresponsive to standard therapies.
  • Variable: Change from baseline in WAS at 12 weeks, with the assignment of the worst possible value (50) to patients who take prohibited medication.
  • Treatments: The novel oral treatment given daily, compared against placebo, with both groups allowed to take rescue antihistamines as needed.
  • Population Summary: The mean difference in change from baseline to week 12 in the WAS score.
  • Clinical Question of Interest: The question focuses on the difference in the mean change from baseline in WAS at week 12, considering the use of prohibited medications as a treatment failure.

  1. Reflection of Real-World Scenarios: This approach acknowledges that patients may require additional medications (deemed as treatment failures for the purpose of the study), which aligns with real-world treatment scenarios and regulatory expectations.

  2. Treatment Effectiveness: By assigning the worst score to those who need prohibited medication, the study directly addresses the question of whether the new treatment can adequately manage the condition without additional interventions.

3 Analysis of Treatment Policy Estimands

3.1 Case Study: type 2 diabetes

Study Design

  • Type: Parallel, randomized, placebo-controlled, blinded trial
  • Size: 400 patients, randomized in a 1:1 ratio between the test treatment and placebo

Population

  • Participants: Patients with type 2 diabetes managed solely by diet and exercise

Treatments

  • Comparison: Test treatment versus placebo

Key Variable

  • Primary Endpoint: Change in Hemoglobin A1c (HbA1c) levels from baseline to week 26. HbA1c is a marker of average blood glucose levels over the previous two to three months, with a decrease indicating improvement in diabetes management.

Summary Measure

  • Assessment: Expected change from baseline to week 26 in HbA1c, with a between-group comparison focusing on the difference in changes between the test treatment and placebo groups.

Intercurrent Event

  • Event Description: Discontinuation of the randomized treatment and switch to unblinded use of the test treatment. This event is considered under a single category termed ‘treatment non-adherence,’ which includes:
    • For patients initially receiving the test treatment, this would involve continuing the test treatment but in an unblinded manner.
    • For patients initially receiving placebo, this would involve switching to the test treatment, also unblinded.

Visit Schedule

  • Visits: One baseline visit (V0) and five post-baseline visits (V1-V5), with V5 at week 26 marking the end of the study period.

Implications of the Design and Intercurrent Event

  1. Unblinding Risks: The possibility of patients switching from placebo to the test treatment and from blinded to unblinded test treatment use could potentially introduce biases or affect the trial’s integrity by revealing treatment assignments. This needs careful handling to maintain the validity of the study outcomes.

  2. Handling of Non-adherence: The study’s approach to treatment discontinuation (categorized as non-adherence) could impact the interpretation of the efficacy data. It’s crucial how this data will be analyzed, as non-adherence might affect the comparability between groups if not properly accounted for in the analysis.

  3. Efficacy Measurement: The primary focus on the change in HbA1c allows for a direct assessment of the treatment’s impact on glucose regulation over a substantial period, aligning well with clinical objectives in diabetes care. The measure is objective and quantifiable, providing a clear metric for evaluating the effectiveness of the treatment.

  4. Ethical Considerations: Allowing patients on placebo to switch to the test treatment (unblinded) after discontinuation could be seen as enhancing the ethical conduct of the trial by potentially providing a beneficial treatment to those not initially receiving it. However, this must be balanced against the risk of bias introduced by such switches.

3.2 Treatment Policy Estimand of Interest

Here’s a breakdown of the key components:

Population:

  • Patients with type 2 diabetes who are managing their condition solely through diet and exercise.

Treatments:

  • The trial compares a test treatment with a placebo. The crucial aspect of this estimand is that it considers the effect of the treatments regardless of the patients’ adherence to the assigned treatment regimen.

Variable:

  • The primary endpoint is the change in Hemoglobin A1c (HbA1c) from baseline to week 26. HbA1c is a key indicator that reflects the average blood glucose concentration over the previous three months.

Summary Measure:

  • The expected change from baseline in HbA1c at week 26, with the analysis focusing on the difference between the two groups. This measure will help determine if the test treatment is more effective than the placebo in lowering blood glucose levels over the trial period.

Data Collection Approach:

  • Data will continue to be collected until the primary endpoint for all patients, including those who do not adhere to the treatment to which they were initially randomized. This approach supports the treatment policy estimand by capturing the full scope of treatment effects, inclusive of all deviations from the protocol that might occur during the trial.

Significance of This Estimand:

  • This estimand is significant because it aims to capture the ‘real-world’ effectiveness of the test treatment. By evaluating the impact of the treatment irrespective of adherence, the estimand provides a more comprehensive understanding of how effective the treatment could be in typical clinical practice, where patients may not always follow prescribed treatments strictly.

The treatment policy estimand approach allows the study results to be more generalizable and reflective of practical clinical outcomes, acknowledging that non-adherence is a common occurrence in real-world settings. This makes the findings relevant for healthcare providers and policymakers when considering the potential benefits and limitations of new treatments for type 2 diabetes.

3.3 Missing data under treatment policy strategy

Missing data imputation is a critical process in clinical trials, particularly when ensuring the integrity and robustness of the study’s results in the face of missing data due to non-adherence, dropouts, or other reasons. Aligning missing data imputation strategies with the targeted estimand and considering clinically and statistically sound assumptions are essential for maintaining the validity of the trial’s conclusions.

Principles for Missing Data Imputation:

  • Alignment with Estimand: The method of imputation should reflect the nature of the estimand. For a treatment policy estimand, the imputation method should accommodate data in a way that reflects the intention-to-treat principle, considering all assigned treatments as if they were followed as per protocol.
  • Clinically Plausible Assumptions: These depend on the therapeutic context, disease characteristics, and the treatment mechanism. Assumptions must consider factors like whether the drug is disease-modifying or merely symptomatic and its pharmacokinetics such as half-life.
  • Adequate Modelling Assumptions: The statistical model used for imputation should be robust, minimizing bias and providing a reliable approximation of missing values based on available data.

Common Imputation Methods:

These methods are often used in scenarios where treatment discontinuation leads to missing data, and the aim is to estimate the trajectory of patients’ outcomes as if they had continued on the assigned treatment or shifted to a control or placebo condition.

When choosing an imputation method, it is critical to consider the nature of the disease and treatment. For instance, in chronic conditions where effects are prolonged and discontinuations common, more nuanced approaches like CIR might be more appropriate than J2R, which could be more suitable for acute settings or where the drug effect is expected to cease immediately upon discontinuation.

Comparison of Methods

  • Reference-Based Methods (J2R, CIR, CR):
    • Jump to Reference (J2R / JR): Assumes all drug effects cease immediately upon discontinuation, with future outcomes following the placebo trajectory.
    • Copy Increment in Reference (CIR): Assumes the rate of change (increment) in the patient’s outcomes will start to mimic those observed in the placebo group post-discontinuation.
    • Copy Reference (CR): Patients are assumed to follow the entire trajectory of the placebo group from the point of their discontinuation.
  • Missing at Random (MAR):
    • Suitable when the reasons for missing data are related to observed factors rather than the missing data itself, assuming a similarity in behavior between those with complete and incomplete data.
  • Retrieved Dropout (RDO) Imputation:
    • Useful in scenarios where it’s possible to track outcomes of patients post-discontinuation, providing a more direct observation of potential outcomes for dropouts. This method is particularly valuable when analyzing long-term effects and adherence issues in clinical trials.

1. Jump to Reference (J2R / JR) Imputation

  • Overview: This method assumes that any drug effect disappears immediately upon discontinuation, and the patient’s condition reverts to what it would have been under the placebo.
    • Description: This method assumes that any effect from the active drug ceases immediately upon discontinuation. Patients are then assumed to “jump” to the trajectory typical of the reference group, usually the placebo arm.
    • Use Case: Appropriate when the drug effect dissipates quickly after discontinuation.
  • Visualization Details:
    • Similar to CR, the blue line represents the drug arm and the black line represents the control (placebo) arm.
    • Patients who discontinue the drug are assumed to revert to a condition similar to the placebo group immediately.
    • Future values are worse than if they had continued on the drug but are aligned with the mean of the placebo group for simplicity in analysis.

2. Copy Reference (CR) Imputation

  • Overview: In the CR method, it is assumed that once a patient discontinues the drug, their future values will mimic the trajectory of the placebo arm, regardless of any benefit they might have initially experienced from the drug.
    • Description: Here, the assumption is that after discontinuation, the changes (increments) in the patient’s measurements mimic those observed in the reference group from that point forward.
    • Use Case: Useful when the drug’s effect diminishes gradually rather than instantly, allowing for a more gradual transition in the effect observed in patients.
  • Visualization Details:
    • The blue line represents the drug arm, and the black line represents the placebo arm.
    • Patients who drop out are assumed to follow the trajectory of the placebo arm exactly from the point of dropout.
    • The imputed values for future observations are aligned with the mean trajectory of the placebo group.

3. Copy Increments in Reference (CIR) Imputation

  • Overview: This method assumes that patients who drop out from the drug arm do not revert entirely to the placebo condition but instead begin to follow the incremental changes observed in the placebo arm. This acknowledges some residual effect of the drug that was taken before dropout.
    • Description: Patients are assumed to follow the distribution pattern of the reference group from the point of randomization. This method effectively resets the patient’s expected outcome to mirror the reference group entirely from the start of the study.
    • Use Case: Best suited for scenarios where treatment effects are unclear or highly variable, or when the treatment is suspected to have no lasting independent effects beyond discontinuation.
  • Visualization Details:
    • The drug’s impact is considered to taper off, not abruptly stop, as patients begin to mimic the incremental progress (or lack thereof) of the placebo arm from their last observed value.
    • This creates a more gradual transition in the dataset, potentially reflecting a more realistic scenario of drug discontinuation effects.

4. Missing at Random (MAR)

  • Implementation: This approach assumes that missing data can be modeled based on similar subjects within the same treatment arm, considering that missingness is not related to unobserved variables.
  • Visualization and Usage: The graph illustrates how variability in outcomes increases over time, which is a typical scenario in long-term studies. The imputed values (squares) are based on the conditional mean given the observed values, which helps maintain the internal consistency of the dataset.

5. Retrieved Dropout (RDO)

  • Concept: The RDO method focuses on utilizing data from patients who discontinued treatment but whose outcomes continue to be tracked. This approach helps model what could happen to patients who drop out, by using data from those who have similar profiles but remain under observation.
  • Implementation and Challenges: For patients with missing data at a specific visit, information is borrowed from similar patients in the same treatment arm who have available dropout data. This method requires a sufficient amount of RDO data for reliable imputation and can lead to variance inflation if the data is not sufficient, impacting the bias-variance trade-off.
  • Visualization: The diagram shows various points where patients either continue, drop out, or are followed after discontinuation, with imputed values being informed by the retrieved dropout data.

3.4 Multiple Imputation

Step 1: Parameter Estimation (Imputation Model)

  • Objective: Fit a multivariate normal distribution for each treatment arm using data observed prior to any intercurrent event (ICE), such as dropout or switching treatments.
  • Components:
    • \(\mu_a, \Sigma_a\): Mean and covariance matrix for the active treatment arm.
    • \(\mu_r, \Sigma_r\): Mean and covariance matrix for the reference (placebo) arm.
    • Uninformative priors are used for both the mean and the covariance matrices, with the covariance matrix typically employing an Inverse Wishart distribution. This choice helps in avoiding bias from overly prescriptive assumptions about the data structure.

Step 2: Imputation

  • Objective: Generate multiple complete datasets by imputing missing values based on the distributions estimated in Step 1.
  • Process:
    • Draw from the posterior distribution of parameters \(\mu_r, \Sigma_r, \mu_a, \Sigma_a\) established in Step 1.

    • Construct a joint distribution of observed and missing data to facilitate imputation.

    • Impute missing data from the conditional distribution of \(Y_{miss} | Y_{obs}\), where \(Y_{miss}\) is the missing data and \(Y_{obs}\) is the observed data, based on the relationships established in the model.

    • The imputation is repeated multiple times (commonly denoted as \(M\) times), creating multiple complete datasets.

    • Different imputation means are calculated depending on the method: J2R, CIR, CR. Each method adjusts the imputation based on the reference trajectory, whether it’s a direct copy, an increment adjustment, or a complete jump to the reference values at the time of dropout.

      • J2R Mean (\(\tilde{\mu}\)):
        • For Jump to Reference, the imputed values for post-ICE data are a direct continuation of the reference group mean from the latest observed time point (\(t_i\)), effectively assuming that the treatment effect disappears and the patient follows the placebo trajectory.
      • CIR Mean (\(\tilde{\mu}\)):
        • For Copy Increment in Reference, the imputed values are calculated as a blend of the active arm’s trajectory up to the last observed time point and then shifting towards the change observed in the reference group. This reflects a gradual decline or alteration in the treatment effect rather than an abrupt stop.
      • CR Mean (\(\tilde{\mu}\)):
        • For Copy Reference, the imputed values are straightforwardly set to follow the reference arm’s mean (\(\mu_r\)), assuming that post-dropout, the patient’s outcomes align exactly with those typically seen in the placebo group.

Step 3: Analysis

  • Objective: Analyze each dataset independently to compute the summary measures of interest, which might include means, variances, or other statistical tests.
  • Importance: This step allows for the assessment of variability and the robustness of the study results across different imputed datasets.

Step 4: Pooling

  • Objective: Combine results from multiple imputed datasets.
  • Methodology: Use Rubin’s rules to pool the results. Rubin’s rules provide a way to combine estimates from multiple imputed datasets to obtain overall estimates and their variances, accounting for both within-imputation and between-imputation variability. This approach helps in deriving more accurate confidence intervals and p-values.

3.5 Analysis of Treatment Policy Estimands

An analysis of the example randomized controlled trial in patients with type 2 diabetes. The analyses in this worksheet target a treatment policy estimand, i.e., we are interested in the comparison between treatment versus placebo group irrespective of whether or not patients experienced the intercurrent event of treatment discontinuation.

3.5.1 Review Data

Although we are interested in the treatment effect irrespective of whether or not a patient discontinued randomized treatment, it is still generally of interest to understand the proportion of patients who adhered or discontinued treatment.

Let us create a table to summarize the number and proportion of patients who discontinued treatment by group and visit.

## $ctl
##  ontrt            1            2            3            4            5
##      0   8   (4.0%)  18   (9.0%)  31  (15.5%)  43  (21.5%)  50  (25.0%)
##      1 192  (96.0%) 182  (91.0%) 169  (84.5%) 157  (78.5%) 150  (75.0%)
##  Total 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%)
## 
## $trt
##  ontrt            1            2            3            4            5
##      0   5   (2.5%)  12   (6.0%)  20  (10.0%)  27  (13.5%)  29  (14.5%)
##      1 195  (97.5%) 188  (94.0%) 180  (90.0%) 173  (86.5%) 171  (85.5%)
##  Total 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%)

Now let us create a figure to show the mean change in HbA1c from baseline by visit, treatment group and whether they have experienced the intercurrent event of treatment discontinuation

3.5.2 Analysis of Data (ANCOVA)

As we have no missing data, we can perform our analysis just using ANCOVA. Our primary estimand is interested in the change in HbA1c from baseline to week 26 (visit 5), so we can restrict our analysis to visitn==5.

## 
## Call:
## lm(formula = hba1cChg ~ group + hba1cBl, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1306 -0.7002 -0.0591  0.7915  3.2440 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.94031    0.63404   9.369  < 2e-16 ***
## grouptrt    -0.68208    0.10915  -6.249 1.07e-09 ***
## hba1cBl     -0.76454    0.07976  -9.585  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.092 on 397 degrees of freedom
## Multiple R-squared:  0.2482, Adjusted R-squared:  0.2444 
## F-statistic: 65.52 on 2 and 397 DF,  p-value: < 2.2e-16

3.5.3 Trial with Missing Data

## $ctl
##                    dispo            1            2            3            4
##                Off-study   2   (1.0%)   7   (3.5%)  14   (7.0%)  24  (12.0%)
##  Off-treatment, on-study   6   (3.0%)  11   (5.5%)  17   (8.5%)  19   (9.5%)
##             On-treatment 192  (96.0%) 182  (91.0%) 169  (84.5%) 157  (78.5%)
##                    Total 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%)
##             5
##   32  (16.0%)
##   18   (9.0%)
##  150  (75.0%)
##  200 (100.0%)
## 
## $trt
##                    dispo            1            2            3            4
##                Off-study   1   (0.5%)   4   (2.0%)   9   (4.5%)  14   (7.0%)
##  Off-treatment, on-study   4   (2.0%)   8   (4.0%)  11   (5.5%)  13   (6.5%)
##             On-treatment 195  (97.5%) 188  (94.0%) 180  (90.0%) 173  (86.5%)
##                    Total 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%)
##             5
##   16   (8.0%)
##   13   (6.5%)
##  171  (85.5%)
##  200 (100.0%)

3.5.4 Complete-Case Analysis

## 
## Call:
## lm(formula = hba1cChg ~ group + hba1cBl, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1634 -0.6974 -0.0336  0.7934  3.1896 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.86939    0.67544   8.690  < 2e-16 ***
## grouptrt    -0.74453    0.11422  -6.518 2.49e-10 ***
## hba1cBl     -0.74898    0.08605  -8.704  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.07 on 349 degrees of freedom
##   (48 observations deleted due to missingness)
## Multiple R-squared:  0.2602, Adjusted R-squared:  0.256 
## F-statistic: 61.38 on 2 and 349 DF,  p-value: < 2.2e-16

3.5.5 Multiple imputation analysis (JR - Jump to Reference)

However, first we need to create:

  • A dataset showing the visit when the intercurrent event occurred for each patient.
  • A list of the key variables to be used in the imputation step.

Note that in reality we never observe the missing data. Therefore, we would not be able to decide on the imputation strategy based on the data. Instead this should be based on clinically plausible assumptions depending on the therapeutic setting, disease characteristics and treatment mechanism.

ry running some different analyses on data_missing using different assumptions for the missing data. Strategies available in rbmi are:

  • MAR - Missing At Random
  • JR - Jump to Reference
  • CR - Copy Reference
  • CIR - Copy Increments in Reference
  • LMCF - Last Mean Carried Forward

Step 1: Fit the imputation model to the observed data

## 
## SAMPLING FOR MODEL 'MMRM' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.0009 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 9 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 1700 [  0%]  (Warmup)
## Chain 1: Iteration:  170 / 1700 [ 10%]  (Warmup)
## Chain 1: Iteration:  201 / 1700 [ 11%]  (Sampling)
## Chain 1: Iteration:  370 / 1700 [ 21%]  (Sampling)
## Chain 1: Iteration:  540 / 1700 [ 31%]  (Sampling)
## Chain 1: Iteration:  710 / 1700 [ 41%]  (Sampling)
## Chain 1: Iteration:  880 / 1700 [ 51%]  (Sampling)
## Chain 1: Iteration: 1050 / 1700 [ 61%]  (Sampling)
## Chain 1: Iteration: 1220 / 1700 [ 71%]  (Sampling)
## Chain 1: Iteration: 1390 / 1700 [ 81%]  (Sampling)
## Chain 1: Iteration: 1560 / 1700 [ 91%]  (Sampling)
## Chain 1: Iteration: 1700 / 1700 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 3.731 seconds (Warm-up)
## Chain 1:                18.591 seconds (Sampling)
## Chain 1:                22.322 seconds (Total)
## Chain 1:

Step 2: Impute the missing data using the imputation model multiple times

Step 3: Analyse each complete dataset

Step 4: Combine the results to obtain point estimate and variance estimation

## 
## Pool Object
## -----------
## Number of Results Combined: 250
## Method: rubin
## Confidence Level: 0.95
## Alternative: two.sided
## 
## Results:
## 
##   ==================================================
##    parameter   est     se     lci     uci     pval  
##   --------------------------------------------------
##      trt_1    -0.287  0.053  -0.392  -0.183  <0.001 
##    lsm_ref_1  -0.073  0.038  -0.147  0.001   0.055  
##    lsm_alt_1  -0.36   0.038  -0.434  -0.286  <0.001 
##      trt_2    -0.691  0.074  -0.836  -0.547  <0.001 
##    lsm_ref_2  -0.052  0.052  -0.155   0.05   0.318  
##    lsm_alt_2  -0.744  0.052  -0.846  -0.641  <0.001 
##      trt_3    -0.703  0.091  -0.882  -0.524  <0.001 
##    lsm_ref_3  -0.08   0.065  -0.207  0.048    0.22  
##    lsm_alt_3  -0.783  0.064  -0.909  -0.657  <0.001 
##      trt_4    -0.703  0.104  -0.908  -0.499  <0.001 
##    lsm_ref_4  -0.055  0.075  -0.202  0.091   0.458  
##    lsm_alt_4  -0.759  0.074  -0.903  -0.614  <0.001 
##      trt_5    -0.716  0.116  -0.943  -0.488  <0.001 
##    lsm_ref_5  0.009   0.083  -0.155  0.173   0.916  
##    lsm_alt_5  -0.707  0.081  -0.866  -0.548  <0.001 
##   --------------------------------------------------

3.5.6 Retrieved-dropout models

This practical above has only covered reference-based imputation methods. An alternative approach is to use retrieved-dropout models. The link below contains a vignette showing how the rmbi package can also be used to implement this approach.

https://insightsengineering.github.io/rbmi/main/articles/retrieved_dropout.html

3.6 Reference

  1. Bartlett JW. Reference-Based Multiple Imputation—What is the Right Variance and How to Estimate It[J]. Statistics in Biopharmaceutical Research, 2021, 15(1): 178–186.

  2. Carpenter JR, Roger JH, Kenward MG. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation[J]. Journal of Biopharmaceutical Statistics, 2013, 23(6): 1352-1371.

  3. Cro S, Morris TP, Kenward MG, Carpenter JR. Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide[J]. Statistics in Medicine, 2020, 39(21): 2815-2842.

  4. Polverejan E, Dragalin V. Aligning Treatment Policy Estimands and Estimators—A Simulation Study in Alzheimer’s Disease[J]. Statistics in Biopharmaceutical Research, 2020, 12(2): 142-154.

  5. White I, Joseph R, Best N. A causal modeling framework for reference-based imputation and tipping point analysis in clinical trials with quantitative outcome[J]. Journal of Biopharmaceutical Statistics, 2020, 30(2): 334-350.

  6. Wolbers M, Noci A, Delmar P, Gower-Page C, Yiu S, Bartlett JW. Reference-based imputation methods based on conditional mean imputation[J].

4 Analysis of Hypothetical Estimands

4.1 Estimation for hypothetical estimands

4.2 Prediction of Hypothetical Trajectories

  • Explicit or implicit predictions of hypothetical trajectories: This refers to making predictions about what might happen under various hypothetical scenarios, using either explicitly stated models or assumptions.
  • Assumptions for the predictions: Assumptions must align with the hypothetical scenarios, influencing the model’s design and expected outcomes.

Notation and Study Design

  • Randomized Treatment (Z): Indicates whether a participant received the treatment (1) or was in the control group (0).
  • Intercurrent Event Indicator (Eᵢ): Shows whether an intercurrent event (ICE) occurred at each visit (1 if occurred, 0 if not). Assumes once an intercurrent event occurs, it continues to exist.
  • Outcome Variable (Y): The observed change in HbA1c from baseline at week 26.
  • Potential Outcome (Yᵢ): Hypothetical outcome without any intercurrent event or under different treatment conditions.
  • Estimand of interest: The difference in the outcome between treatment and control groups, assuming no intercurrent events.
  • Covariates (X₀, Xᵢ): Baseline and subsequent covariates that might affect the outcome, measured throughout the study.

  • Z influences E₁: Treatment can affect the likelihood of an intercurrent event.
  • Eᵢ influences Eᵢ₊₁: Indicates a cascade effect where an intercurrent event at one point increases the likelihood of another in the future.
  • X₀ → X₁ → X₂ → … → X₄: Represents changes or measurements of covariates (like HbA1c) over time.
  • Arrows into Y: Shows that all these factors, including treatment, intercurrent events, and covariates, influence the final outcome of HbA1c levels.

Treatment Policy

  • Green Pathways (Estimated Treatment Effect): These represent the direct and indirect effects of the treatment (Z) on the outcome (Y). The treatment affects each point in time where HbA1c is measured (X1 through X4) and can influence intercurrent events (E1 through E5), which in turn can affect subsequent measurements and the final outcome.
  • This diagram shows all possible impacts of the treatment throughout the course of the study, including its potential to affect the occurrence of intercurrent events, which are particularly critical in clinical trials.

Hypothetical Estimand

  • Green Pathways (Estimated Treatment Effect): These lines show the direct influence of treatment on the outcome (Y) and intermediate HbA1c measurements (X1 to X4), assuming no intercurrent events (E1 to E5) affected the outcomes. This hypothetical estimand aims to estimate what the treatment effect would be in an ideal scenario where no intercurrent events alter the course of treatment.
  • Red Pathways (Biasing path): These paths highlight potential sources of bias if the intercurrent events were ignored in the analysis. They show how each intercurrent event (E1 to E5) could influence the HbA1c measurements (X2 to X4), potentially confounding the true treatment effect.

Key Points

  • Treatment Policy Estimand: This considers all real-world effects, including intercurrent events, providing a comprehensive view of treatment effectiveness.
  • Hypothetical Estimand: By ignoring intercurrent events, this focuses on the direct effect of the treatment under idealized conditions, useful for understanding the intrinsic efficacy of the treatment.

Concept of Time-dependent Confounding:

  • Time-dependent Confounders (X₁-X₄): These are variables that:
    1. Are affected by previous treatment.
    2. Influence the probability of future treatment.
    3. Impact the outcome of interest (Y).

In this study, the levels of HbA1c measured at different times (X₁ through X₄) are time-dependent confounders because each measurement can influence and be influenced by treatment decisions and outcomes.

The challenge lies in adjusting for these confounders without inadvertently blocking the pathway through which the treatment effect is transmitted. This is a key concern in causal inference: - Blocking the Effect Pathway: Adjusting for X₁-X₄ directly could block some of the treatment effects since these confounders are also intermediaries of treatment effects.

Directed Acyclic Graphs (DAGs) Analysis - DAG (a):

  • Green Paths: Show the estimated treatment effect paths which demonstrate how treatment (Z) potentially affects the final HbA1c measurement (Y) through intermediate measurements (X₁-X₄) and intercurrent events (E₁-E₅).
  • Red Paths: Illustrate the biasing paths where confounders (X₁-X₄) and intercurrent events (E₁-E₅) may misrepresent the true treatment effect if not properly accounted for.

Directed Acyclic Graphs (DAGs) Analysis - DAG (b):

  • Simplified Representation: Focuses on paths that are purely related to treatment effects, ignoring paths through time-dependent confounders to avoid the bias introduced by adjusting for these confounders.

Desired Comparison:

  • Y(Z = 1, E₁-₄ = 0) vs Y(Z = 0, E₁-₄ = 0): This comparison aims to measure the treatment effect assuming no intercurrent events have occurred to purely see the effect of the treatment.

4.3 Methods for Estimating Hypothetical Estimands

4.3.1 Multiple Imputation (MI)

Multiple Imputation for Hypothetical Estimands:

  • Purpose: In clinical trials, especially those with longitudinal measurements, data after an ICE are often not considered relevant for the hypothetical estimand (the outcome that would have been observed had the ICE not occurred). This can be thought of as a missing data problem.
  • Method: MI treats all post-ICE data as missing. For example, if an intercurrent event \(E1\) occurs, then subsequent measurements \(X1, X2, X3, X4\), and the final outcome \(Y\) are treated as missing. The imputation model is used to estimate these missing values based on other available data, under the assumption that the missingness is related to observed data but not to the missing data itself.
  • Assumptions: A key assumption in this context might be Missing At Random (MAR), where the likelihood of missing data depends only on observed data.

Use of MAR in Predicting Hypothetical Trajectories:

  • Definition: Under MAR, the missingness of data is related only to the observed data, not to any unobserved data.
  • Application: In hypothetical estimands, assuming MAR suggests that the data missing due to ICEs can be imputed based on the observed characteristics and responses of similar patients who did not experience the ICEs.
  • Feasibility: Whether MAR is a sensible assumption depends on the specific characteristics of the intercurrent event and the study design. If the ICEs are believed to be random with respect to future outcomes after controlling for observed data, MAR can be a reasonable assumption. However, if ICEs are related to unobserved future outcomes or unmeasured confounders, then MAR would not be appropriate, and a more sophisticated method like MNAR (Missing Not At Random) might be needed.

4.3.2 Inverse Probability Weighting (IPW)

Inverse Probability Weighting for Hypothetical Estimands: - Purpose: IPW is used to adjust for the non-random occurrence of intercurrent events by modeling the process leading to these events. - Method: Each participant’s data is weighted by the inverse of the probability of their observed treatment path, given their covariates and previous treatment history. This approach aims to create a pseudo-population where the occurrence of intercurrent events is independent of treatment, mimicking the condition of no ICEs.

Problem Setup:

  • Confounder (\(X_i\)): This is a variable that influences both the outcome (\(Y\)) and the likelihood of an intercurrent event (\(E_{i+1}\)).
  • Intercurrent Event (\(E_i\)): Events that can affect the continuation or outcome of the treatment.

IPW Idea:

  • Upweighting: In the presence of intercurrent events that might skew the observed outcome, IPW adjusts the influence of each individual’s data based on their probability of not experiencing the intercurrent event, given their confounders. This adjustment helps in maintaining a balanced representation of all groups within the study.
  • Creating a Pseudo-Population: IPW adjusts the dataset to create a “pseudo-population” in which the distribution of individuals who did and did not experience intercurrent events is balanced as if these events were independent of the measured confounders. This adjustment helps to mitigate the effect of confounders that are linked to the likelihood of experiencing an intercurrent event.

Imagine a study with two baseline groups differentiated by color (blue/green), which represent different levels or types of a confounder \(X\). Suppose that the probability of not having an intercurrent event \(E=0\) given the confounder blue is \(P(E=0|blue) = 0.5\) (i.e., 50%). If in the actual study, fewer blue individuals did not experience the event compared to green, then each blue individual who did not experience the event might be weighted more heavily (e.g., a weight of 2) to represent not only themselves but also those blue individuals who did have the event, thus simulating a scenario where the intercurrent event is independent of being blue or green.

  • Weights (\(w_k\)): Each subject \(k\) receives a weight calculated as the inverse of the probability of being free from the intercurrent event, conditioned on their treatment status \(Z\) and confounders \(X\). Mathematically, this is expressed as: \[ w_k = \frac{1}{P(E_k = 0|Z_k, X_k)} \] This weight is used to adjust their contribution to the analysis, effectively increasing the influence of underrepresented scenarios within the observed data.

IPW for Hypothetical Estimand:

In estimating a hypothetical estimand (where we hypothesize the outcome had the ICEs not occurred), IPW helps to simulate a dataset where: - ICEs are absent: It weights the data so that the analysis can proceed as if the ICEs did not occur. - Independent of Confounders \(X\): It also ensures that this simulated dataset is independent of the distribution of the confounder, \(X\), making the analysis robust against confounding bias due to \(X\).

This method is crucial for ensuring that estimates of treatment effects or exposures are unbiased by confounders or selection mechanisms related to intercurrent events, providing a clearer picture of the causal effects of interest.

4.3.3 G-Computation

  • Purpose: G-computation is a statistical technique used to estimate the effect of a treatment or exposure in the presence of confounders.
  • Method: It involves modeling the outcome as a function of treatment and confounders. In the context of hypothetical estimands, it can sometimes be equivalent to multiple imputation, depending on how the outcome model is specified and used.

4.3.4 Advanced Methods

  • Augmented IPW and Targeted Maximum Likelihood Estimation (TMLE): These methods combine the strengths of IPW and outcome modeling to produce more efficient and less biased estimates.
  • G-Estimation: This method is specifically designed for estimating the effects of time-varying treatments in the presence of time-varying confounders that are also affected by past treatment.

4.4 Time-dependent Intercurrent Event Occurrence

Objective: - We aim to estimate the probability that a patient does not experience the intercurrent event at any time during the study. This is crucial for properly weighing each subject in the analysis to account for these events.

Calculation of Probabilities

Formula for Probability: - The probability calculation involves taking the product of the conditional probabilities of not experiencing the intercurrent event across all time points up to the current visit \(i\): \[ \prod_{i=1}^5 P(E_{i,k} = 0 | Z = z_k, E_{1,i-1,k} = 0, X_{i-1,k} = x_{i-1,k}) \] Here: - \(E_{i,k}\) = Intercurrent event indicator at visit \(i\) for individual \(k\). - \(Z = z_k\) = Treatment assignment for individual \(k\). - \(E_{1,i-1,k} = 0\) = Condition where no intercurrent events have occurred up to the previous visit. - \(X_{i-1,k} = x_{i-1,k}\) = Covariates observed up to the previous visit for individual \(k\).

Intuition: - Each factor in the product adjusts for the history of treatment and intercurrent events, along with the observed covariates, making the probability specific to the pathway that individual \(k\) has actually followed.

Subject Weights Calculation

Weight Formula: - Each subject \(k\) receives a weight calculated as: \[ w_k = \frac{1}{\prod_{i=1}^5 P(E_{i,k} = 0 | Z = z_k, E_{1,i-1,k} = 0, X_{i-1,k} = x_{i-1,k})} \] - Purpose of Weights: These weights are used to create a weighted sample (pseudo-population) in which the occurrence of intercurrent events is statistically independent of the observed covariates and treatment assignment. This adjustment is necessary to estimate the effect of the treatment under the hypothetical scenario where no intercurrent events occur.

This approach is particularly important in longitudinal studies where events occurring after baseline can affect the treatment and subsequent outcomes. By adjusting the contribution of each participant’s data based on the likelihood of remaining event-free, IPW helps to reduce bias in estimating treatment effects, providing a clearer picture of the treatment’s potential impact under ideal conditions.

4.5 Estimation of weight and treatment effect

4.5.1 Estimation of Weight

Methods:

  1. Nonparametric Methods:
    • Example: Calculate the sample proportion within each stratum of \(X\). This approach does not assume any specific form of the relationship between the covariates and the probability of the intercurrent event \(E\). It’s straightforward but may not be practical with continuous or high-dimensional covariates due to the “curse of dimensionality.”
  2. Parametric Methods:
    • Example: Use logistic regression to model the probability of \(E\) given covariates \(X\) and treatment \(Z\). This approach allows for more efficient estimation in the presence of multiple or continuous covariates and can provide better insights into how specific variables influence the probability of experiencing an intercurrent event.

4.5.2 Estimation of Treatment Effect

Weighted Sample Mean: - Uses the weights derived (potentially from one of the above methods) to calculate a mean that reflects a population where the treatment assignment \(Z\) is independent of the potential outcomes under no intercurrent events.

Hájek Estimator: - A specific type of estimator for the mean that adjusts the weighted sample mean by the sum of the weights, helping to stabilize estimates especially in smaller samples or in unbalanced designs: \[ \text{Hájek Estimator: } \frac{\sum_{i=1}^n (1-E_i)Z_iY_iw_i}{\sum_{i=1}^n (1-E_i)Z_iw_i} - \frac{\sum_{i=1}^n (1-E_i)(1-Z_i)Y_iw_i}{\sum_{i=1}^n (1-E_i)(1-Z_i)w_i} \] This formula calculates the weighted averages of the outcome \(Y\) separately for the treated and control groups, adjusting for the distribution of the weights.

Weighted Least Squares Regression: - Uses the weights in a regression model of \(Y\) on \(Z\) for those cases where \(E = 0\), treating it as a Marginal Structural Model. This model provides a way to estimate causal effects by appropriately accounting for time-dependent confounders adjusted by weights.

Inference and Confidence Intervals (CIs): - Robust Sandwich Variance Estimator: This method is used to calculate confidence intervals that are robust to misspecification of the model. While valid, these CIs might be conservative (i.e., wider than necessary), potentially overestimating the uncertainty. - Non-parametric Bootstrap: Often used to validate CIs by resampling the data with replacement and recalculating the estimator multiple times. This method can provide a more accurate representation of the uncertainty if the original CI assumptions are violated or if the sample size is small.

4.6 Issue of Large Weights in IPW

  1. Positivity Assumption: This is crucial for IPW. If the propensity score (probability of receiving treatment given covariates) for any individual is exactly 0 or 1, it leads to infinite weights, which are problematic.
  2. Consequence of Extreme Propensity Scores: Propensity scores close to 0 or 1 result in very large weights for those individuals, making the estimates less precise and potentially unstable. This usually occurs when an individual’s characteristics (covariates) make them highly unlikely or likely to receive treatment compared to the rest of the sample.

How to Handle Large Weights

  1. Investigate the Cause:
    • Identify Outliers: Determine who these individuals with large weights are and investigate whether they represent a combination of covariate values that are rare within the dataset.
    • Understand Their Impact: Analyze whether these individuals have extreme values for some confounders which might be influencing the propensity score significantly.
  2. Use Stabilized Weights:
    • Method: Adjust the original weights by the ratio of the marginal probability of receiving treatment to the conditional probability given the covariates. This often reduces the variance of the weights without introducing bias.
    • Reference: Cole & Hernan (2008) provide a detailed discussion on this technique.
  3. Trim Weights:
    • Approach: Remove or cap individuals with weights beyond a certain threshold (e.g., greater than \(w_0\)).
    • Effect: This method changes the target population of the inference because it effectively excludes or down-weights the most extreme cases.
  4. Truncate Weights:
    • Method: Set weights that are smaller than the \(p\%\) quantile to the \(p\%\) quantile and similarly for weights larger than the \((1-p)\%\) quantile.
    • Trade-off: Truncating weights introduces some bias (because it alters the actual contribution of each observation based on its probability of treatment) but reduces variance, which can lead to more stable estimates. This is a classic bias-variance trade-off scenario.

4.7 Analysis of Hypothetical Estimands

In the below scenario, using complete-case analysis results in a smaller treatment effect estimate, highlighting that biases due to confounding can vary in direction. Although Multiple Imputation (MI) and Inverse Probability Weighting (IPW) show similar standard errors (SE) in this specific instance with only one post-baseline covariate, typically, MI produces smaller SE compared to IPW. However, the reduced SE with MI comes at the expense of more extensive parametric assumptions, indicating a trade-off between statistical precision and reliance on model-based assumptions.

4.7.1 Review Data

Load the dataset DiabetesExampleData_wide_noICE.rds, which contains the ideal trial in which no intercurrent events were observed. The dataset is contextually the same as those used previously in the Treatment Policy Estimands Practical, except that for the analyses in this worksheet we will primarily work with data in wide, rather than long, format.

4.7.2 Analysis of Data (ANCOVA)

## 
## Call:
## lm(formula = hba1cChg_5 ~ group + hba1cBl, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2518 -0.6455 -0.0299  0.7744  3.1562 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.28553    0.63645   8.305 1.59e-15 ***
## grouptrt    -0.80323    0.10957  -7.331 1.29e-12 ***
## hba1cBl     -0.66254    0.08007  -8.275 1.97e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.096 on 397 degrees of freedom
## Multiple R-squared:  0.2356, Adjusted R-squared:  0.2317 
## F-statistic: 61.17 on 2 and 397 DF,  p-value: < 2.2e-16

4.7.3 Trial where some participants did not adhere to randomized treatment

Now we will move onto the more realistic example, where some patients did not adhere to their randomized treatment. We are interested in the hypothetical estimand as if all patients adhered to their randomized treatment.

ctl
(N=200)
trt
(N=200)
Overall
(N=400)
ontrt_1
0 8 (4.0%) 5 (2.5%) 13 (3.3%)
1 192 (96.0%) 195 (97.5%) 387 (96.8%)
ontrt_2
0 18 (9.0%) 12 (6.0%) 30 (7.5%)
1 182 (91.0%) 188 (94.0%) 370 (92.5%)
ontrt_3
0 31 (15.5%) 20 (10.0%) 51 (12.8%)
1 169 (84.5%) 180 (90.0%) 349 (87.3%)
ontrt_4
0 43 (21.5%) 27 (13.5%) 70 (17.5%)
1 157 (78.5%) 173 (86.5%) 330 (82.5%)
ontrt_5
0 50 (25.0%) 29 (14.5%) 79 (19.8%)
1 150 (75.0%) 171 (85.5%) 321 (80.3%)

4.7.4 Analysis of Patients without Intercurrent Event

Let us first perform the analysis dropping any patients who did not adhere to treatment, i.e., only using the observed outcome data in the patients who did adhere to their randomized treatment until the end of follow-up. This analysis will only provide an unbiased estimate of the treatment effect under the assumption that treatment non-adherence occurred completely at random in both treatment arms.

## 
## Call:
## lm(formula = hba1cChg_5 ~ group + hba1cBl, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1646 -0.7189 -0.0195  0.8072  3.1928 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.99485    0.72999   8.212 5.51e-15 ***
## grouptrt    -0.73757    0.12147  -6.072 3.60e-09 ***
## hba1cBl     -0.76496    0.09408  -8.131 9.60e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.081 on 318 degrees of freedom
##   (79 observations deleted due to missingness)
## Multiple R-squared:  0.2629, Adjusted R-squared:  0.2582 
## F-statistic:  56.7 on 2 and 318 DF,  p-value: < 2.2e-16

4.7.5 Analysis using Multiple imputation

Now, let us move onto more principled analyses, starting with a multiple imputation approach. Here, for the patients who experience the intercurrent event of treatment non-adherence, we aim to impute what their values of HbA1c would have been if they had not experienced the intercurrent event by modelling the hypothetical future trajectory based on their past trajectory and the trajectories of similar patients.

As soon as a patient experiences the intercurrent event of treatment non-adherence, all future values of HbA1c will be missing. Therefore, we have a monotone missingness pattern, i.e., if a patient discontinues treatment at visit 1, then HbA1c will be missing from visit 1 to visit 5, whereas if a patient discontinues treatment at visit 4, then HbA1c will only be missing for visits 4 and 5.

The code below will perform a sequential imputation starting by imputing HbA1c at visit 1, sequentially through the visits, finally imputing HbA1c at visit 5. Each imputation is performed using Bayesian linear regression method=“norm” including treatment and all previous values of HbA1c.

We then fit our ANCOVA model to each imputed dataset, and finally pool the results of each imputation using Rubin’s rules to obtain a point estimate and its standard error.

4.7.6 Analysis using Inverse Probability Weighting (IPW)

Another way to perform this analysis is to use inverse probability weighting. In this approach, we only use the outcome data for the patients who did not experience the intercurrent event, but we up-weight these patients in such a way that they also represent what we estimate would have been observed in similar patients who did experience the intercurrent event.

The weights are given by the inverse of the propensity score, which is the propensity for patients to experience the intercurrent event at each visit. We start by calculating the weights to account for patients who experienced the intercurrent event at visit 1.

ctl
(N=192)
trt
(N=195)
Overall
(N=387)
weights
Mean (SD) 1.04 (0.0615) 1.03 (0.0284) 1.03 (0.0484)
Median [Min, Max] 1.02 [1.00, 1.51] 1.02 [1.00, 1.16] 1.02 [1.00, 1.51]
Sum 200 200 400

Our original sample size was 400 patients (200 in each arm). At visit 1, 8 patients had experienced the intercurrent event in the control arm, and 5 patients in the treatment arm. This reduced our sample size to 192 and 195 respectively, but through up-weighting patients who were similar to those who experienced the intercurrent event, we have maintained an effective sample size of 200 per arm.

Next we continue this process sequentially from visit 2 up to the end of follow-up.

ctl
(N=150)
trt
(N=171)
Overall
(N=321)
weights
Mean (SD) 1.36 (1.64) 1.18 (0.243) 1.26 (1.14)
Median [Min, Max] 1.11 [1.00, 20.9] 1.11 [1.01, 3.41] 1.11 [1.00, 20.9]
Sum 203 202 405

It is always a good idea to check the weights, because extreme weights can highlight potential violations of the positivity and/or modelling assumptions.

In the example above, the largest weight is almost 21. This means that the data of this one patient is now being used to represent twenty other similar patients who did not adhere to the randomized treatment. We could consider this as suggesting that patients with these characteristics only have a 1/21 = ~5% probability of adhering to their randomized treatment. This is perhaps not so extreme, and suggests that the positivity assumption holds in this case. However, it is not uncommon to see examples with weights >100, meaning this patient had less than 1% probability of not experiencing the intercurrent event. Although this would still strictly meet the positivity assumption that the probability is >0 and <1, the variance estimation will start to increase significantly as weights become more extreme.

The sum of the weights should equal the original sample size in each treatment arm. Here we see the sum of the weights in each arm is not quite equal to 200, but it is close. There is no strict limit on what can be considered ‘close enough’, but if the sum of the weights differs significantly from the original sample size it can suggest issues with the modelling of the propensity score.

Finally, we can check the results by fitting a weighted linear regression. The variance is estimated using a “robust” (Huber-White) sandwich estimator to account for the weighting.

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  5.201939   0.625226  8.3201 2.619e-15 ***
## grouptrt    -0.787932   0.126312 -6.2380 1.412e-09 ***
## hba1cBl     -0.650241   0.078175 -8.3178 2.661e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

4.7.7 IPW (bootstrap SE)

Note that the robust variance estimation used above assumes that the propensity score is known rather than estimated. This leads to a conservative overestimated variance. Unbiased variance estimates can instead be obtained via a bootstrap procedure.

## [1] 0.1227755

4.8 Reference

  1. Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models[J]. American Journal of Epidemiology, 2008, 168(6): 656–664.

  2. Olarte Parra C, Daniel RM, Bartlett JW. Hypothetical estimands in clinical trials: a unification of causal inference and missing data methods[J]. Statistics in Biopharmaceutical Research, 2022, 15(2): 421–432.

  3. Olarte Parra C, Daniel RM, Wright D, Bartlett JW. Estimating hypothetical estimands with causal inference and missing data estimators in a diabetes trial[E]. arXiv e-prints, 2023, arXiv-2308.

  4. Hernán MA, Robins JM. Causal inference: What if[M]. Boca Raton: Chapman & Hall/CRC, 2020.